Women philosophers weren't around for a long time (published, at least). A majority of the most famous philosophy texts that exist are written by men. Women philosophers weren't around for a long time (published, at least). A majority of the most famous philosophy texts that exist are written by men. In this report, I will compare the texts written by the men and women of philosophy, and texts written about them such as biographies and reviews. Comparison techniques used are sentiment analysis and content overlap.
The main dataset used for this analysis can be found at https://www.kaggle.com/kouroshalizadeh/history-of-philosophy. It contains over 300,000 sentences from over 50 texts spanning 10 major schools of philosophy. The represented schools are: Plato, Aristotle, Rationalism, Empiricism, German Idealism, Communism, Capitalism, Phenomenology, Continental Philosophy, and Analytic Philosophy.
Two other datasets that I source for this analysis are from Wikipedia and google search results. For Wikipedia, I create a dataset that contains each author's Wikipedia bio page levering the wikipedia python package. (Example for Simone de Beauvior: https://en.wikipedia.org/wiki/Simone_de_Beauvoir). The expectation is that author's wikipedia pages will be an objective source on authors' lives regardless of sex. The google dataset contains the content of the first 10 websites that appear from a search (Example: "philosopher Beauvior review"). The google dataset is scraped from the internet using BeautifulSoup. I expect the google dataset to be the most subjective since we are searching for reviews and the authors' opinions will be clear.
Methodologies used in this report include web scraping, sentiment analysis, and word clouds. Web scraping is the process of using automated code to extract content and data from a website, this is used to collect the google data. Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text. They are typically categorized as positive, negative, or neutral and can be used to determin the writer's attitude towards a particular topic. A word cloud is is a collection of words visualized in different sizes. The bigger and bolder the word appears, the more often it's mentioned within a text.
Data cleaning methodologies used are removing special characters, removing stopwords, and stemming. This same methodology is applied to all datasets. Removing special characters does exactly as it says - removes special characters from text. Stopwords are the English words which does not add much meaning to a sentence. They can safely be ignored without sacrificing the meaning of the sentence. Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words.
Further technical and specific explanations of these methodologies and their applications to this report can be found in ../lib/README.md.
%run ../lib/utils.ipynb
The data contain 13 schools of philosophy, they are:
display_descriptive_counts()
| authors | titles | records | schools | |
|---|---|---|---|---|
| sex | ||||
| female | 3 | 3 | 18635 | 1 |
| male | 32 | 55 | 339083 | 12 |
There is far more data for men than women. There are only 3 female philosophers in the data and there are 32 men authors. The men make up about 95% of the data. Also, the male authors collectively have works in all but one school - feminism. The female authors only write in school - feminism.
Below is a timeline of each authors first publication separated by sex.
plot_timelines()
We can see that women came along to the world of philosophy (in this dataset) much later than men. Men have philiosophers dating all the way back to 350 BC, Plato. Although men have published work much earlier than women, only 4 authors have works published before the 1600s. Most men have publish dates after the 1600s and a lot between after the 1900s. All women's work was published after the 1750s.
Below is a vizualation of the average sentiment scores by sex for each data source (philosphy text data, wikipedia bio data, and google search data). We are interested in whether one data source or sex has a dramatically different sentiment score compared to others.
First, we take a look at sentiment scores from the philosophy data (figure above). These scores will reflect the average authors' sentiment in their own works.
plot_senti_scores_data()
We can see that women have slightly less positive scores compared to men. Though, in general, everything in this chart is pretty neutral.
Next, we take a look at the sentiment scores for the wikipedia data. We expect this source to be the most objective since it is just a biography.
plot_senti_scores_wiki()
It is very interesting that the simple and vader sentiment scores are much lower for females as compared to males. We expected the wikipedia biographies to be the most objective data source of all so it is interesting to see this large difference. This can be due to the fact that the men's biogrpahies have more positive words such as "pioneer", since they are relatively more foundational to the subject of philosophy compared to women.
Lastly, we take a look at sentiment scores from the google search data (below). This data should logically be the most subjective of them all since we searched google for reviews. The scores will probably reflect the author's opinion on the article.
plot_senti_scores_google()
The vader sentiment score is about the same for men and women. In the textblob and simple sentiment scores the women have slightly more positive scores than men. This leads us to believe that women's work in philosophy is just as (or more) positively received as men's work.
Word clouds are a great tool to see which words are most popular in texts. The larger words are used more commonly. Word clouds are like the stars - the longer you look, the more you'll see!
We first compare the word clouds of the philosophy data for men and women.
wordcloud_data()
It is interesting to see that in the men's works objective words such as "object", "reason", "truth", and "fact" and firm words such as "must" are used very often. The women's works uses more subjective words a lot such as "situation", "sometimes" and soft words such as "lover". This can lead us to believe that women are writing from more of a subjective perspective. It is interesting that we see man is mentioned a lot in both sexes' works but women is only mentioned a lot in the women's work.
Next we look at word clouds for the authors' wikipedia pages below.
wordcloud_wikipedia()
These word clouds look much different compared to the philosophy data word clouds. Both word clouds have words that you'd typically see in a biography: countries of origin, "first" if the author is doing something novel or their first work, "book", author names, etc.
wordcloud_google()
The google word clouds look pretty similar to the wikipedia word clouds.
Lastly, we'll take a look at the same word clouds but in venn diagrams. It will be interesting to see the overlap between sexes by source.
First, we'll look at the philosophy data.
venn_wordcloud_data()
It looks like there's a lot of women's words are found in men's text but the opposite is not true. One word that appears only in women's texts is "statistic", which is very interesting. This can lead us to believe that females rely on statistics in their texts more than men. However, to that point, we do see that the word quantitatively appears in both texts quite often.
Next, we'll look at the wikipedia data.
venn_wordcloud_wiki()
In the wikipedia data we see the same rleationship as in the philosophy data - there's a lot of women's words are found in men's text but the opposite is not true. In this we see some words that are in only females pages and not mens' that we didn't see before (or as big) such as "motherhood", "fuitility". New words appearing a lot (large and in the middle) compared to the philosophy data are "feminisim". It is interesting that we don't see feminism overlap in the philosophy data above but we see it overlap here in the wikipedia pages. A word that we see in the mens' wikipedia pages but not in the women's is "science" - this is interesting that we don't see this same pattern appear in the texts. This can lead us to believe that women's philosophical works aren't perceived as scientific as men's works are. This is especially interesting consider that in the previous venn diagram, we only saw the word "statistic" in women's texts.
Lastly, we will look at the word cloud venn diagram for the google search texts. We hypothesized that google would be the most subjective data source of them all.
venn_wordcloud_google()
We see a bunch of new patterns here. Of interst are the extremely positive and negative words that we didn't see before. This is expected since these are subjective articles reviewing philosophers. Example of extrememly negative words in men's reviews are "lazy" and "bitter" and for women "delusional", "tarnished", "oversexed". There are also positive words used in both men and women's reviews as well. Overlapping positive words in reviews include "masterpieces", and "approve".
I conclude that men write more matter-of-factly while women write more from their perspective and subjectively. Wikipedia, a supposed source of truth, suprisingly has a much more positive average sentiment for men's bio pages as opposed to women's pages. We also see that wikipedia describes men's work as more scientific as opposed to women's even though in their texts they both use the word quantitatively and only women use the word statistic. Perhaps there is biased in Wikipedia. The google dataset didn't have huge takeaways from the sentiment analysis. Men and women's works have roughly the same sentiment from google reviews. In the word clouds we do see strong sentiments in both directions.
!jupyter nbconvert --to html men_and_women_of_philosophy.ipynb
[NbConvertApp] Converting notebook men_and_women_of_philosophy.ipynb to html [NbConvertApp] Writing 6454351 bytes to men_and_women_of_philosophy.html